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Transcription of DNA to RNA by DNA-dependent RNA polymerase (RNAP) is the first step of gene expression and 
a major regulation point. Bacteriophages hijack their host's transcription machinery and direct it to serve their 
needs. The gp39 protein encoded by Thermus thermophilus phage P23-45 binds the host's RNAP and inhibits 
transcription initiation from its major "-10/-35" class promoters. Phage promoters belonging to the minor 
"extended -10" class are minimally inhibited. We report the crystal structure of the T. thermophilus RNAP 
holoenzyme complexed with gp39, which explains the mechanism for RNAP promoter specificity switching. gp39 
simultaneously binds to the RNAP p-flap domain and the C-terminal domain of the a subunit (region 4 of the a 
subunit [a 4 ]), thus relocating the P-flap tip and a 4 . The -45 A displacement of a 4 is incompatible with its binding 
to the -35 promoter consensus element, thus accounting for the inhibition of transcription from -10/-35 class 
promoters. In contrast, this conformational change is compatible with the recognition of extended -10 class 
promoters. These results provide the structural bases for the conformational modulation of the host's RNAP 
promoter specificity to switch gene expression toward supporting phage development for gp39 and, potentially, 
other phage proteins, such as T4 AsiA. 
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Transcription of DNA to RNA is the first step of gene 
expression and is accomplished by DNA-dependent RNA 
polymerase (RNAP). The bacterial RNAP core enzyme 
is an ~400-kDa protein complex with a crab claw-like 
shape consisting of five subunits, a 2 pp'oo (Zhang et al. 
1999). For transcription initiation, the RNAP core en- 
zyme binds one of several promoter specificity cr subunits 
to form the holoenzyme a 2 (3p'axr (Murakami et al. 2002b; 
Vassylyev et al. 2002). Typically, the holoenzyme recog- 
nizes promoters through specific interactions between 
two DNA-binding domains in a — a 2 and cr 4 (regions 2 and 
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4 of the a subunit) — and consensus promoter elements 
around the —10 and —35 positions, respectively, relative 
to the transcription start site (Fig. 1A ; Campbell et al. 
2002; Murakami et al. 2002a,- Young et al. 2002; Feklistov 
and Darst 2011; Zhang et al. 2012). The interaction 
between a and the —10/— 35 promoter elements is essen- 
tial for the subsequent formation of the open promoter 
complex in which the dsDNA is melted around posi- 
tions — 12 ~ +1, and the template DNA strand is loaded 
into the RNAP nucleic acid-binding channel to initiate 
transcription. 

In addition to the major —10/— 35 class promoters, 
there are the minor "extended -10" class promoters, 
which have the extended -10 element (the -10 element 
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Figure 1. The structure of the RNAP holoenzyme*gp39 complex. [A) Schematic depictions of the —10/— 35 and extended —10 
promoters and the effect of gp39 on transcription. [B-D) Overall structure of the T. thermophilus RNAP holoenzyme*gp39 complex 
(holo*gp39) in three orientations. (£) Close-up view of the (3-flap domain, cr factor, and gp39. 



plus a TG motif located immediately upstream) but lack 
a discernible —35 promoter element (Fig. 1A; Mitchell 
et al. 2003). RNAP seems to select these promoters 
through extensive interactions between the a 2 domain 
and the extended —10 element and does not depend on 
the a 4 domain. 

Transcription initiation is the major point of gene 
regulation by cellular factors. Bacteriophages employ 
their own proteins to appropriate the host's transcription 
system and direct it to serve their needs. Some of these 
proteins interact directly with the host's RNAP and 
switch its promoter specificity to modulate the entire 
transcriptome in favor of phage development (Nechaev 
and Severinov 2003). The cr 4 domain and its binding site, 
the |3-flap domain of the RNAP core enzyme, are among 
the preferred targets for phage transcription regulators 
(Dove et al. 2003; Lambert et al. 2004; Nechaev et al. 2004; 
Baxter et al. 2006; Yuan et al. 2009; Twist et al. 2011; 
Osmundson et al. 2012). These regulators directly bind to 
the interface of either cr-DNA, RNAP-a, or RNAP-DNA 
and thus interfere with the RNAP-DNA interaction. 

In Thermus thermophilus bacteriophage P23-45, the 
middle gene product, gp39, is a key factor that probably 



controls the switching of transcription from the host T. 
thermophilus to the phage genes (Minakhin et al. 2008; 
Berdygulova et al. 201 1, 2012). This ~ 16-kDa protein binds 
to the (3 -flap domain of the T. thermophilus RNAP holo- 
enzyme and strongly inhibits transcription initiation from 
the —10/— 35 class promoters. In contrast, gp39 has only 
a minor effect on transcription from the P23-45 middle/late 
promoters, which belong to the extended — 10 class (Fig. 1 A; 
Minakhin et al. 2008; Berdygulova et al. 201 1). To elucidate 
the structural basis of the promoter specificity switching by 
the phage-encoded protein, we solved the crystal structure 
of the T. thermophilus RNAP holoenzyme bound to gp39 
and performed structure-based biochemical analyses. 
Unexpectedly, our study revealed that gp39 switches the 
promoter specificity by modifying the cr 4 orientation of 
RNAP without competitively dissociating the entire 
cr factor from RNAP or blocking cr 4 binding to DNA. 

Results 

Structure determination 

We determined the crystal structure of gp39 bound to the 
T. thermophilus RNAP holoenzyme (holo # gp39) at 3.6 A 
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resolution (Fig. 1B-E ; Table 1). We also obtained single 
wavelength anomalous dispersion (SAD) data from crys- 
tals of the holoenzyme complex with selenomethionine 
(SeMet) -containing gp39 and precisely located the methi- 
onine residues in gp39 based on the Se anomalous peaks 
(Fig. 2A). Consistent with previous biochemical data 
(Berdygulova et al. 2012), the gp39 molecule primarily 
contacted the RNAP (3-flap domain. Another gp39 mol- 
ecule was present in the crystalline asymmetric unit, but 
its interaction with RNAP is probably functionally in- 
significant (Supplemental Fig. 1). Therefore, we focused 
on the first gp39 molecule, which directly binds the p 
flap. We also determined higher-resolution crystal struc- 
tures of the gp39 variants bound to the (3-flap domain 
fragment of RNAP (p residues 703-830) (Fig. 2B-D), 
which agree well with the interaction between the p flap 
and the first gp39 molecule in holo»gp39. For the a A 
subunit, models were built for the ct 2 and ct 4 domains and 
the a 3 _ 4 linker but not for the remaining parts because 
of missing electron density. Significant conformational 



changes were observed in the RNAP structure, as de- 
scribed below (and in Supplemental Fig. 2). 

The gp39 structure and its binding mode 
to the RNAP fi-flap domain 

The gp39 structure consists of the gp39 core (residues 
1-107) and the C-terminal tail (residues 119-141), which 
are connected by a linker (residues 108-118, mostly 
disordered) (Figs. 1D,E, 2A). The gp39 core comprises 
the central p sheet and the N- terminal a helix (Fig. 2) and 
binds primarily to the RNAP p-flap domain. A small 
protuberance from the RNAP flap p sheet (flap protuber- 
ance, residues 723-740) is the major gp39 core-binding 
site, and the interaction relies on shape complementarity 
and extensive hydrophobic/hydrophilic interactions 
(Fig. 3A-D). In gp39, Asp94, Asn95, Ile97, and His99 form 
a hydrogen-bonding network with Arg721, Thr723, and 
Asp 725 in the flap (corresponding to Glu849, Thr851, and 
Asp853, respectively, in the Escherichia coli RNAP p 



Table 1. Data collection and refinement statistics 


Sample 


Holoenzyme*gp39 


P-Flap # gp39 

/z' i not 
(6-1U9J 


P-Flap«gp39 

(6-loZ) 


Data collection 










Data set 


Native 


SeMet (gp39) 


SeMet (gp39) 


Native 




(SPring8, BL41XU) 


(PF, NE3A) 


(SPring8, BL32XU) 


(PF, NE3A) 


Wavelength 


1.0000 A 


0.9780 A 


0.9780 A 


1.0000 A 


Space group 


P3 2 21 


P3 2 21 


P4i2!2 


P4!2i2 


Cell dimensions 


a = b = 294.4 A 


a = b = 294 .4 A 


a = b = 99.2 A 


a = b = 213.8 A 




c = 223.3 A 


c = 223.6 A 


c = 117.2 A 


c = 234.6 A 




a = p = 90°, 7 = 120° 


a = p = 90°, 


a = p = 7 = 90° 


a = p = 7 = 90° 






7 = 120° 






Resolution 


20 A-3.6 A 


50 A-5.0 A 


50 A-2.35 A 


50 A-3.3 A 




(3.73 A-3.60 A) 


(5.09 A-5.00 A) 


(2.39 A-2.35 A) 


(3.36 A-3.30 A) 


j? a 

iV sym 


0.218 (0.557) 


0.264 (0.539) 


0.140 (0.407) 


0.175 (0.493) 


Mean I/a 


7.1 (2.6) 


8.6 (3.5) 


19.8 (6.5) 


14.2 (5.2) 


Completeness 


99.5% (98.9%) 


99.6% (99.7%) 


99.5% (100.0%) 


99.9% (100.0%) 


Redundancy 


6.0 (4.2) 


10.8 (8.3) 


17.7 (16.6) 


10.6 (9.8) 


Unique reflections 


129,380 (12728) 


48,287 


24,886 


82,014 


Refinement statistics 










Resolution 


20 A-3.6 A 




50 A-2.35 A 


50 A-3.30 A 


Number of reflections 


127,431/6370 




46,341/2380 b 


81,938/4107 


(total/test) 










-^work 


0.250 




0.181 


0.223 


-Rfree 


0.279 




0.226 


0.249 


Number of atoms 


28,954 




3831 


22,026 


Protein atoms 


28,953 




3730 


22,026 


Water 


0 




101 


0 


Zn 2+ ions 


1 




0 


0 


RMSDs 










Bond length 


0.006 A 




0.008 A 


0.009 A 


Bond angles 


1.3° 




1.1° 


1.2° 


Ramachandran plot) 










Most favorable 


82.6% 




92.7% 


93.1% 


Allowed 


16.1% 




7.3 


6.8% 


Generously allowed 


1.3% 




0.0% 


0.1% 


Disallowed 


0.0% 




0.0% 


0.0% 



a ^ S ym = Xiia Xi \Ii[hkl) - {I[hkl))\/% hkl Xi Ii[hkl), where (I(hkl)) is the mean intensity of multiple Ii[hkl) observations of the symmetry- 
related reflections. 
b Anomalous diffractions. 
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Figure 2. The gp39 structure. [A] The holoenzyme-bound gp39. 
The N-terminal core domain (gp39 core) is colored blue, and the 
gp39 C-terminal tail is colored red. The anomalous difference 
map calculated with the Se SAD data is shown as a green mesh. 
(5) The structure of the complex between a truncated gp39 
variant (residues 6-109) and the RNAP (3 -flap domain fragment 
(p residues 703-830). (C) The structure of the complex between 
another gp39 variant (residues 6-132) and the (3-flap domain 
fragment, with the 2F D — F c electron density map contoured at 
1.0a. (D) The same structure as in B, overlaid with the 2F Q — F c 
electron density map contoured at 1.0a. 

subunit). The tip of the gp39 p sheet (residues 59-63) and 
the loop following the N-terminal helix (residues 16-21) 
wrap the loop of the p-flap domain (p 73 7-740), where 
Arg61 in gp39 forms a salt bridge with Glu739 (E. coli 
Glu867) in the (3 flap. Bacterial two-hybrid experiments 
confirmed the gp39 # (3-flap interaction (Supplemental Fig. 
3). The replacement of Asp 725 in the (3 flap, at the middle 
of the interface with gp39, with a bulky Trp residue (|3 
D725W) abolished the interaction. The reciprocal double 
substitution of Trp and Phe for Asp94 and Asn95, re- 
spectively, in gp39 (D94W/N95F), which would disrupt 
their interactions with Arg721, Thr723, and Asp 725 in 
the p flap, also strongly impaired the interaction. 

To determine whether gp39 binding to the (3 -flap 
domain is functionally relevant, we analyzed the tran- 
scription inhibition of an E. coli RNAP variant on the 
— 10/— 35 and extended —10 class promoters (Fig. 3F). The 
wild-type E. coli RNAP is naturally resistant to gp39 due 
to the inability of gp39 to bind to the RNAP. However, the 
phage protein efficiently bound to the E. coli RNAP 
mutant bearing the T. thermophilus (3 flap instead of 



the E. coli (3 flap (hybrid Eco-tth{\ av ) (Fig. 3E, lanes 2,2'). 
Similarly to T. thermophilus RNAP, gp39 inhibited tran- 
scription by the hybrid RNAP on the —10/— 35 promoters 
but not on the extended —10 promoters of the phage (Fig. 
3G). Equally efficient transcription inhibition by the 
hybrid E. coli RNAPs was observed with the E. coli cr 70 
and T. thermophilus ct a subunits, demonstrating that cr 
does not contribute to the species specificity of gp39 
action (Supplemental Fig. 4A). The (3 D725W substitution 
in the hybrid RNAP abolished gp39 binding (Fig. 3E, lanes 
3,3') and relieved the inhibition (Fig. 3G). Thus, the gp39- 
binding site in the p-flap domain is the major determi- 
nant for RNAP inhibition. 

The gp39 C-terminal tail dramatically relocates cr 4 

In the holo # gp39 structure, the C-terminal tail of gp39 
interacts with the gp39 core, the p-flap tip, and cr 4 (Figs. 
IE, 3A). The gp39 C-terminal helix (residues 122-132), 
the p-flap tip helix, and two cr 4 helices (HI and H4, 
residues 342-359 and 393-410, respectively) form the 
hydrophobic core of this tripartite interaction. The in- 
terface is formed between Tyrl25, Leu 128, Met 129, 
Alal31, Met 132, and Leu34 in gp39; Leu350, Leu354, 
and Ala357 in ct 4 (corresponding to Leu540, Thr544, and 
Val547, respectively, in E. coli a 70 ),- and Leu774, Ile777, 
and Phe778 in the p flap (corresponding to Leu902, Ile905, 
and Phe906, respectively, in E. coli p) (Fig. 3B). Impor- 
tantly, the C-terminal tail of gp39 (6-132) was not 
observed in the structure bound to the p-flap domain 
fragment (Fig. 2C), demonstrating that the gp39 tail is 
fixed through its interaction with cr 4 . Bacterial two- 
hybrid data confirmed the weak interaction of gp39 with 
cr 4 and revealed that this interaction is abolished by the 
deletion of the gp39 C-terminal helix (Supplemental Fig. 
3). Thus, the mobile C-terminal helix of gp39 primarily 
interacts with cr 4 . 

gp39 binding dramatically repositions cr 4 and the p-flap 
tip helix in the holoenzyme (Fig. 4A). The C-terminal tail 
of gp39 plays a key role in this rearrangement. In the 
holoenzyme structure without gp39, the C-terminal H5 
helix of a 4 forms hydrophobic interactions with the p-flap 
tip helix (Vassylyev et al. 2002). In the holo # gp39 com- 
plex, the gp39 C-terminal helix displaces cr 4 H5 (Fig. 4B) 
and interacts with cr 4 and the p-flap tip, causing ct 4 to 
rotate together with the p-flap tip, while the gp39 core 
stays bound to the p-flap protuberance (p 723-740). Con- 
sequently, cr 4 and the p-flap tip are concomitantly rotated 
by —60° relative to the rest of the flap domain, thus 
shifting the position of cr 4 by —45 A. Although the 
displaced H5 helix of a 4 is not visible in the electron 
density, the relative positions between cr 4 and the p-flap 
tip remain almost unchanged (Fig. 4B). 

The gp39 C-terminal tail is essential 
for transcription inhibition 

To investigate the significance of the gp39 C-terminal tail, 
we examined the transcription inhibition activity of a mu- 
tant gp39 lacking the C-terminal tail (gp39 1-1 13). In vitro 
transcription experiments revealed that, in agreement 
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Figure 3. The interaction between RNAP holoenzyme and gp39. (A) The gp39, (3 flap, and a factor structures in the holo*gp39 
complex. [B-D) Detailed views of the molecular interactions. (£) Analysis of gp39 interactions with Eco-tthn ap RNAP (wild type or (3 
D725W mutant). (Left panel) The Eco-tthn av RNAP holoenzymes containing E. coli a 70 were mixed with gp39 and resolved by native 
PAGE. [Right panel) The bands containing RNAP were excised from the gel and analyzed by denaturing SDS-PAGE. (Lane 3) The Eco- 
tti2fi a p RNAP containing the D725W mutation migrated as a doublet on the native gel. Although the reason for the band separation 
remains unknown, the upper and lower band fractions both contain stoichiometric amounts of the RNAP subunits and lack gp39 
(Supplemental Fig. 5). The D725W variant Eco-tth^ RNAP exhibits transcription activity comparable with that of the wild type (as 
shown in G). [F] Promoters used in this study. (P68m) Middle promoter of phage P23-45 belonging to the extended -10 class,- (T5 N25) A 
model —10/— 35 promoter. The positions of the —35, —10, and TG (extended —10) elements and the starting point of transcription (+1) 
are shown. (G) Effects of gp39 on the Eco-tth^ RNAP activity on -10/-35 (T5 N25) and extended -10 (P23-45 P 68M ) promoters. The 
position of the 3 -nucleotide abortive transcription products is indicated. 



with the structure, the deletion of the C -terminal tail 
strongly impairs the ability of gp39 to inhibit transcrip- 
tion (Fig. 4C) but only minimally affects gp39 binding to 
the RNAP core and holoenzymes (Supplemental Fig. 6). 
These data strongly suggested that ct 4 displacement by 
the gp39 C-terminal tail is the major mechanism of 
transcription inhibition on promoters containing the —35 
element. 

Besides its activity as an initiation inhibitory factor, 
gp39 also acts as a potent transcription anti-terminator 
during transcription elongation (Berdygulova et al. 2012). 
However, the deletion of the gp39 C-terminal tail did not 
affect the gp39 anti-termination activity (Fig. 5). Thus, 
the anti-termination activity by gp39 does not require its 
C-terminal tail and is based on a mechanism distinct 
from that inhibiting initiation. 

gp39 prevents stable promoter complex formation 

T. thermophilus RNAP forms highly unstable promoter 
complexes, which exist in rapid equilibrium with the free 
holoenzyme and DNA. To determine which step of 
transcription initiation is targeted by gp39, we performed 
order-of-addition experiments with the hybrid Eco-tth^ 
RNAP, which forms stable promoter complexes typical of 



E. coli RNAP (Supplemental Fig. 4B). gp39 inhibited Eco- 
tthfiap RNAP transcription only when bound before the 
DNA and had no effect on preformed promoter com- 
plexes (Fig. 4D). Thus, the interaction of cr 4 with the —35 
promoter DNA element counteracts the action of gp39, 
likely by fixing the 0-4/p-flap tip in the "proper" position. 

If gp39 targets the interaction of cr 4 with the —35 
element, then it should inhibit an early stage in the 
promoter complex formation pathway. Indeed, potassium 
permanganate probing revealed that gp39 inhibited the 
open complex formation on a —10/— 35 promoter (Fig. 4E, 
lanes 3,4), and the inhibition was strongly dependent on 
the gp39 C-terminal tail (Fig. 4E, lanes 5,6). On the other 
hand, gp39 exerted minimal effects on an extended —10 
promoter (Supplemental Fig. 7C). To detect possible gp39- 
induced changes in the RNAP-promoter contacts, we 
performed DNase I and ExoIII footprinting analyses of 
promoter complexes in the absence and presence of gp39 
(Supplemental Fig. 7). Eco-tth^ RNAP generated a clear 
footprint on the promoter DNA, corresponding to the open 
complex. In contrast, the footprint almost completely 
disappeared in the presence of gp39. Thus, gp39 inhibits 
an early step in the open complex formation pathway, 
probably by preventing the interaction of a 4 with the —35 
element during the initial stage of promoter recognition. 
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Figure 4. The mechanism of transcription inhibition by gp39. 
[A,B) Superimposition of holo # gp39 and the free holoenzyme 
(Protein Data Bank [PDB] 1IW7; green) (Vassylyev et al. 2002). 
(A) The core modules of the two RNAP structures were super- 
imposed by minimizing the root mean square deviation (RMSD) 
between the Ca atoms. (5) Superimposition of the cr 4 domains of 
the two structures. (C) Transcription inhibition by wild-type 
gp39 (wt) and its truncated variant lacking the C-terminal tail 
(1-113) on a -10/-35 promoter (T5 N25). (D) Analysis of 
transcription inhibition by gp39 added either before (lane 2) or 
after (lane 3) the promoter DNA. Black triangles indicate order 
of addition. (£) Analysis of gp39 (wild type [wt] and 1-113) on 
promoter melting by potassium permanganate probing. Thy- 
mine positions in the melted promoter region (numbered 
relative to the transcription starting point) are indicated on 
the right. 



P' zipper and zinc finger domains of the P' subunit, which 
are critical for transcription from the extended —10 pro- 
moters (Yuzenkova et al. 2011). Therefore, the model 
reasonably explains why gp39 does not interfere with 
transcription from the extended —10 class promoters. 

Discussion 

Many crystal structures of bacterial, archaeal, and eukary- 
otic RNAPs have been reported (Zhang et al. 1999; Cramer 
et al. 2001; Hirata et al. 2008; Spahr et al. 2009; Murakami 
2013; Zuo et al. 2013). However, only a few structures of 
RNAP complexes with transcription factors are available 
(Murakami et al. 2002b ; Vassylyev et al. 2002; Kostrewa 
et al. 2009; Wang et al. 2009; Liu et al. 2010; Tagami et al. 
2010; Cheung and Cramer 2011). In this study, we solved 
the structure of the T. thermophilus RNAP holoenzyme 
(RNAP + cr) bound to the phage-encoded protein gp39. The 
structure revealed the complete molecular view of an 
external transcription factor modifying the holoenzyme 
conformation to switch the promoter specificity of the 
host's RNAP (Fig. 6C). 
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Docking model with promoter DNA 

To further examine the inhibition of promoter complex 
formation, we docked promoter DNA to holo»gp39 (Fig. 
6 A; Supplemental Fig. 8). The superimposition of the 
holo # gp39 and Thermus aquaticus closed promoter com- 
plex structures revealed that gp39 relocates the cr 4 do- 
main to the opposite side of the DNA, far away from the 
—35 promoter element. This clearly explains why gp39 
prevents complex formation with the —10/— 35 pro- 
moters. However, gp39 binding does not influence the a 
region 2 (ct74-261, a 2 ), which is responsible for the —10 
element recognition. Consequently, it is likely that gp39 
does not interfere with cr 2 binding to the —10 region, and 
thus gp39 allows transcription from promoters that de- 
pend on the extended —10 element. Although the super- 
imposition suggested that a clash occurs between cr 4 and 
DNA (region of —19 to approximately —26), the steric 
hindrance could be avoided by an —15° rotation of the 
DNA (Fig. 6B). In this new orientation, the promoter spacer 
region (region of —19 to approximately —32) is closer to the 



21 

1 23456789 10 

Figure 5. Transcription anti-termination by gp39. (A) Tran- 
scription reactions were performed on a DNA fragment con- 
taining the T7A1 promoter followed by the tR2 terminator of 
phage \, as described previously (Berdygulova et al. 2012). (5) 
Transcription anti-termination efficiencies by T. thermophilus 
RNAP with wild- type gp39 and its truncated variant lacking the 
C-terminal tail (1-113) were analyzed. The positions of the 
starting 21 -mer RNA and terminated (tR2) and full-length run- 
off (RO) RNAs are indicated on the left. 
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Figure 6. Structural basis of promoter switching by gp39. [A) 
Superimposition of holo # gp39 and the T. aquaticus closed pro- 
moter complex (PDB 1L9Z) (Murakami et al. 2002a) on the 
clamp module and cr subunit. The latter structure shows only cr 
(moss green) and the promoter DNA (slate). The -10 and -35 
elements are highlighted in yellow. [B] Model of an open 
promoter complex of holo # gp39 on DNA containing an extended 
— 10 promoter. The DNA is reoriented by —15° and is located 
close to the zipper and zinc finger domains (light green). The 
extended —10 element is highlighted in green, while the other 
parts of the DNA are colored as in A. (C) Diagram summarizing 
the mechanism by which gp39 switches the promoter specific- 
ity of RNAP. The left route shows transcription initiation from 
a —10/— 35 promoter. The gp39-bound holoenzyme cannot form 
a stable promoter complex, and thus transcription is prevented. 
The light route shows transcription initiation from an extended 
-10 promoter, which is not affected by gp39. 



Transcription initiation may be inhibited by a few 
typical mechanisms. Transcription repressors directly 
bind DNA to prevent the RNAP from binding to a pro- 
moter (Pabo and Sauer 1984). Anti-cr factors bind to a a 
factor and block its interaction with RNAP to prevent 
complete holoenzyme formation (Campbell et al. 2003; 
Lambert et al. 2004; Sorenson et al. 2004; Baxter et al. 
2006). Some anti-cr factors also interfere with the in- 
teraction between cr and DNA by blocking the DNA- 
binding site of cr. The T7 gp2 protein binds and occludes 



the DNA-binding channel of RNAP (Camara et al. 2010; 
Nechaev and Severinov 2003). In contrast, the present 
study revealed that the phage protein gp39 functions via 
a novel, unanticipated mechanism by repositioning the 
promoter-binding determinant (cr 4 ) without interacting 
with DNA, dissociating the host cr factor from RNAP, 
masking the DNA-binding determinants of cr, or occlud- 
ing the RNAP channels. The repositioning prevents pro- 
moter complex formation on the —10/— 35 promoters and 
thus shuts off most of the host's genes. However, it has 
a much smaller effect on the extended —10 promoters 
that drive the transcription of the middle and late genes of 
the phage (Berdygulova et al. 2011). The mechanism of 
transcription inhibition by gp39 does not involve signif- 
icant conformational reorganization of cr 4 . Thus, the 
conformational modification of the RNAP holoenzyme 
by the small phage protein is sufficient to switch the 
entire gene transcription profile in the infected bacterial 
cells. 

Similarly to gp39, the anti-cr factor AsiA from bacte- 
riophage T4 inhibits transcription by E. coli RNAP from 
— 10/— 35 promoters but does not inhibit transcription 
from extended —10 promoters (Severinova et al. 1998). 
AsiA binds to the cr 4 domain of the E. coli cr 70 RNAP 
subunit to occlude the interface between cr 4 and the p-flap 
tip. It also induces a significant conformational change in 
a 4 to deform the DNA-binding site (Lambert et al. 2004). 
On the other hand, AsiA also tightly binds to the p-flap 
tip, and this interaction is required for efficient transcrip- 
tion inhibition, suggesting that AsiA serves as a bridge 
between cr 4 and the p-flap tip (Yuan et al. 2009). There- 
fore, the potential AsiA-induced relocation of cr 4 could 
contribute to the transcription inhibition, as in the case of 
gp39. AsiA is not only a transcription inhibitor but also 
a coactivator protein required for T4 middle gene tran- 
scription dependent on the activator protein MotA. AsiA 
binds to both cr 4 and MotA, and then MotA binds 
the "MotA box" DNA sequence around position —30. 
The potential relocation of cr 4 by AsiA could facilitate the 
efficient interaction between cr 4 and MotA to achieve the 
appropriation of cr by Mot A/ AsiA. The structure of the E. 
coli RNAP complex with AsiA will clarify this point. The 
repositioning of cr 4 might be a widely used phage protein 
strategy to switch the host's gene expression. 

Intriguingly, gp39 is a bifunctional regulator: Besides its 
initiation inhibitory activity, gp39 also has transcription 
an ti- termination activity (Berdygulova et al. 2012). The 
deletion of the gp39 C-terminal tail, which is critical for 
inhibiting transcription initiation, does not affect the 
gp39 anti-termination activity (Fig. 5). Therefore, the 
gp39 C-terminal tail is dispensable for the anti-termina- 
tion activity, consistent with the observation that the 
C-terminal tail is fixed only in the context of the cr-con- 
taining holoenzyme. The an ti- termination activity thus 
requires the gp39 core to bind to the RNAP p-flap 
domain, a known modulator of transcription elongation 
and termination efficiency. Ongoing structural analyses 
of gp39 -bound transcription elongation complexes should 
clarify the structural basis of this alternative activity of 
the bifunctional phage transcription regulator. 
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Materials and methods 

Protein preparation 

For the crystallization of RNAP, the RNAP holoenzyme was 
purified from T. thermophilics cells, as described previously 
(Vassylyev et al. 2002). For the biochemical analyses, T. thermo- 
philics RNAP core and holoenzymes with the C-terminally 
lOHis-tagged subunit were purified from the T. thermophilus 
HB8 rpoC::10H strain, as described (Sevostyanova et al. 2007). 
E. coli core RNAP was purchased from Epicentre. Hybrid E. coli 
RNAPs [Eco-ttha av and Eco-tthn SLV D725W) were created as fol- 
lows. The E. coli (3 flap (residues 831-1060) was replaced with the 
T. thermophilus (3 flap (residues 702-833) by site-directed muta- 
genesis in the plasmid pIA545, bearing the E. coli rpoB gene. The 
D725W substitution (T. thermophilus numbering, corresponding 
to position 853 in the E. coli (3 subunit) was introduced into the (3 
flap in the hybrid RNAP by site-directed mutagenesis. Both 
variants of the hybrid rpoB gene were recloned into the plasmid 
pIA679, which encodes all E. coli core RNAP subunits (the rpoA, 
rpoB, rpoC, and rpoZ genes), with a 6His tag appended to the C 
terminus of the (3 subunit. The resulting hybrid RNAPs were 
expressed in E. coli BL21(DE3) and purified as described 
previously (Borukhov and Goldfarb 1993; Pupov et al. 2010), 
using Polymin P precipitation followed by chromatography on 
Ni 2+ -equilibrated HiTrap chelating, HiTrap heparin, and Superose- 
6 columns (GE Healthcare). N-terminally 6His-tagged E. coh a 70 
and T. thermophilus <j a subunits were expressed in BL21(DE3) and 
purified as described previously (Borukhov and Goldfarb 1993; 
Sevostyanova et al. 2007). 

For the preparation of gp39 and its variants, the genes encod- 
ing wild-type gp39 and the gp39 variant lacking 28 C-terminal 
amino acid residues (gp39 1-113) were cloned into the pET28a 
vector. To obtain gp39 variants with N-terminal cAMP- 
dependent protein kinase (PKA) sites, the corresponding genes 
were cloned into the appropriately modified pET16b vector. These 
gp39 variants were expressed and purified as described (Berdygulova 
et al. 2012). To obtain the SeMet derivative of wild-type gp39, the 
cells bearing the expression vector were cultured in M9 medium 
containing SeMet (Van Duyne et al. 1993). The protein was 
purified by chromatography on Ni-Sepharose FF, MonoQ, and 
Superdex 75 columns (GE Healthcare Biosciences). For the 
preparation of gp39 (6-109) and gp39 (6-132), the ORFs for these 
variants were cloned into the vector pET-47b, which allows 
expression of the variant proteins with a removable N-terminal 
6His tag (Novagen). While gp39 (6-109) was expressed as a SeMet 
derivative, as described above, gp39 (6-132) was expressed as the 
native protein. The proteins were purified as described above, 
and the His tag was removed with HRV3C protease. 

For the expression of the (3-flap domain, its coding region 
[T. thermophilus rpoB 703-830) was cloned into the expression 
vector pGEX-6P, which allows expression of the protein as an N- 
terminal GST fusion (GE Healthcare). The protein was purified on 
a glutathione Sepharose FF column (GE Healthcare), with GST tag 
removal by HRV3C protease, followed by chromatography on 
HiTrap butyl FF and Superdex 75 columns (GE Healthcare). 

Crystallization 

For the cocrystallization of the T. thermophilus RNAP holoen- 
zyme and the full-length gp39, the sample solution (5 |jlM 
holoenzyme + 25 |jlM full-length gp39) was mixed with the 
equivalent volume of the reservoir solution containing 45% 
tacsimate (pH 6.7). Crystallization was performed by the sitting- 
drop vapor diffusion technique at 20°C. Crystals with dimen- 
sions of 0.4 X 0.1 X 0.1 mm appeared in 3 wk. 



For the cocrystallization of the (3-flap domain and gp39 (6- 
109), the sample solution (50 |xM £ flap + 50 |xM gp39 6-109) was 
mixed with the equivalent amount of the reservoir solution, 
containing 50 mM Tris-HCl buffer (pH 8.0), 12% polyethylene 
glycol (PEG) 8000, 8% 2-propanol, and 3% 1,6-hexanediol. Plate- 
like crystals appeared in a week. 

For the cocrystallization of the (3-flap domain and gp39 (6- 
132), the sample solution (50 fjiM £ flap + 50 |jlM gp39 6-132) was 
mixed with the equivalent amount of the reservoir solution, 
containing 50 mM MES-NaOH buffer (pH 5.6), 6% PEG 8000, 
200 mM KCl, and 10 mM MgCl 2 . Crystals with dimensions of 
0.15 X 0.05 X 0.05 mm appeared in 3 mo. 



Data collection and structure determination of holo»gp39 

The X-ray diffraction data set for the native holo*gp39 crystal 
was obtained at beamline BL41XU of SPring-8 (Table 1). The 
SAD data set for the SeMet-containing crystal, in which the 
SeMet derivative of gp39 was complexed with the native 
holoenzyme, was collected at beamline NE3A of the Photon 
Factory. For the data collection, the crystallization solution con- 
taining 20% glycerol was used as the cryoprotectant. These data 
were processed with the HKL2000 software package (Otwinowski 
and Minor 1997). 

The holo«gp39 crystal belongs to the space group P3 2 21, with 
unit cell dimensions of a = b = 295 A and c = 223 A. The 
asymmetric unit contains one holoenzyme and two gp39 mole- 
cules. The structure was solved by molecular replacement with 
the program PHASER (McCoy et al. 2007) using the coordinates 
of the T. thermophilus holoenzyme structure (Protein Data Bank 
[PDB] 1IW7) (Vassylyev et al. 2002) as the search model. The 
holoenzyme structure was divided into —30 rigid bodies, and 
their positions were manually modified with the program COOT 
(Emsley et al. 2010) and refined with the program CNS (Brunger 
2007). For the tip portion of the nonconserved domain (NCD), 
the coordinates of that region of the T. thermophilus holoenzyme 
(PDB 3DXJ) (Mukhopadhyay et al. 2008) were used. After the 
placement of the RNAP model, extra electron density remained, 
corresponding to the gp39 molecules. One gp39 molecule was 
associated with the (3 -flap domain of RNAP, as seen in the 
(3-flap*gp39 (6-109) structure. Therefore, we placed the model 
of (3-flap*gp39 (6-109) within the electron density by super- 
imposing the (3-flap domains. This positioning of gp39 was 
confirmed to be correct by the identification of the Se anomalous 
peaks corresponding to Met38, Metl29, and Metl32 of gp39 in 
the anomalous difference map using the SAD data. We further 
built the model of the gp39 C-terminal helix in the 2F 0 — F c 
electron density map of the native crystal by using the positional 
information of Metl29 and Metl32 from the SAD data and 
secondary structure prediction with Phyre (Kelley and Sternberg 
2009). Finally, we copied the coordinates of this gp39 molecule 
and manually placed them into the electron density for the 
second molecule (gp39_2). We observed an anomalous peak for 
Met38 of the second gp39 molecule but not for Met 129 and 
Metl32, probably because of the flexibility of the C-terminal 
helix. The structure was refined with the programs COOT, CNS, 
PHENIX, and Refmac5 (Vagin et al. 2004; Brunger 2007 ; Adams 
et al. 2010 ; Emsley et al. 2010 ; Winn et al. 2011). Restraints for 
structural refinement by Refmac5 were generated by ProSMART 
(Nicholls et al. 2012) using the structures of the T. thermophilus 
RNAP holoenzyme (PDB 2A6H) and (3-flap-gp39 (6-109) as the 
reference models. The structure was refined at 3.6 A to R and 
Rfree values of 0.258 and 0.282, respectively. Structural superim- 
position with other RNAP coordinates (Zhang et al. 1999, 2012; 
Cramer et al. 2001; Murakami et al. 2002a ; Vassylyev et al. 2002, 
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2007; Tagami et al. 2010; Weixlbaumer et al. 2013) was accom- 
plished with the CCP4 program suite (Winn et al. 2011). 

Data collection and structure determination 
of p-flap*gp39 (6-109) 

The cocrystals of (3-flap«gp39 (6-109) consisted of the native 
protein of the (3 -flap domain and the SeMet derivative of gp39 (6- 
109). The crystals were immersed in the crystallization solution 
containing 35% glycerol as the cryoprotectant and were flash- 
cooled in liquid nitrogen. A SAD data set was collected at 
beamline BL32XU of SPring-8 (Table 1). The data were indexed, 
integrated, and scaled with the HKL2000 program (Otwinowski 
and Minor 1997). The space group is V^{1{1, with unit cell 
dimensions of a = b = 99.2 A and c = 117.2 A. The asymmetric 
unit contains two pairs of (3-flap # gp39 (6-109). Heavy-atom 
searches and phase calculations were performed by the MRSAD 
method using the program PHASER (McCoy et al. 2007) driven 
by PHENIX AutoSol (Adams et al. 2010). The coordinates of the 
(3 -flap domain in the T. thermophilus holoenzyme structure 
(PDB 1IW7) (Vassylyev et al. 2002) were used as the search model 
in the MRSAD method. The structure was refined at 2.35 A to R 
and Rfree values of 0.191 and 0.238, respectively, by using COOT 
(Emsley et al. 2010) and PHENIX (Adams et al. 2010). 

Data collection and structure determination 
of (3-flap*gp39 (6-132) 

The cocrystals of (3-flap*gp39 (6-132) were immersed in the 
crystallization solution containing 35% glycerol as the cryopro- 
tectant and were flash-cooled in liquid nitrogen. X-ray diffraction 
data were collected at beamline NE3A at the Photon Factory 
(Table 1). The data were indexed, integrated, and scaled with the 
HKL2000 software package (Otwinowski and Minor 1997). The 
crystal belongs to the space group R^{1{1, with unit cell di- 
mensions ota = b = 213.8 A and c = 234.6 A. The asymmetric unit 
contained 12 complexes. The structure was solved by molecular 
replacement with the program PHASER (McCoy et al. 2007) 
using the coordinates of the (3-flap*gp39 (6-109) complex as the 
search model. The structure was refined at 3.3 A to R and Rf ree 
values of 0.223 and 0.249, respectively, by using COOT (Emsley 
et al. 2010) and PHENIX (Adams et al. 2010). 

Ultracentrifugation analysis 

To investigate the oligomeric state of gp39 in solution, a sedi- 
mentation equilibration experiment was performed using an 
analytical ultracentrifuge (Optima XL-I, Beckman Coulter). 
Analytical cells with a six-channel centerpiece were used, with 
each channel filled with 100 julL of sample or 1 10 julL of reference 
solution. The protein was dissolved in 10 mM Tris-HCl buffer 
(pH 8.0) containing 200 mM NaCl, and concentrations of 0.8, 0.4, 
and 0.2 mg/mL were examined. An eight-position rotor (An-50 
Ti) was rotated at 20,000, 22,000, and 24,000 rpm (average g 
forces of -29,000, 35,000, and 42,000, respectively) at 20°C. Data 
were collected after 12, 14, and 16 h of revolution at each speed 
by observing the absorbance at 280 nm. Finally, the protein 
samples were completely sedimented at 40,000 rpm (106,000g) 
for 6 h to generate the baseline. Data were analyzed with the XL- 
A/XL-I data analysis software version 6.03 using a partial specific 
volume of 0.743 mL/g and a solvent density of 1.007 g/mL. 

Native gel binding assay 

Native gel binding assays with E. coli RNAP core and nonen- 
zymes and wild-type or mutant gp39 were performed as described 



(Berdygulova et al. 2012). To compare the efficiencies of the 
interactions of wild-type and mutant gp39 with the E. coli RNAP 
core and holoenzymes, gp39 variants were radiolabeled with 
[7- 32 P]-ATP at their PKA sites using the PKA enzyme (New 
England Biolabs) according to the manufacturer's protocol and 
were used in the native gel binding experiments under the same 
conditions. 

Bacterial two-hybrid assay 

Bacterial two-hybrid assays for analyses of gp39*RNAP interac- 
tions were performed using the E. coli strain FW102 O l 2-62, 
containing the lacZ gene under the control of the test promoter 
p!tfcOR2-62 on an F' episome, as described previously (Nickels 
2009; Berdygulova et al. 2012). Plasmids containing gene fusions 
of gp39 (wild type, a variant with the 94W/95F double sub- 
stitution, and a C-terminally truncated variant [residues 1-122]), 
the T. thermophilus (3-flap (wild-type [residues 703-830] and 
a variant with the 725W substitution), and the <t 4 domain 
(residues 337-423) were obtained by cloning the corresponding 
PCR fragments between the NotI and BamHI sites into the 
plasmids pBRaLN (encoding the N- terminal domain of the 
E. coli RNAP a subunit) and pACXCI32 (encoding the DNA- 
binding domain of the XCI protein). Empty vector plasmids were 
used as negative controls. 

In vitro transcription 

Abortive transcription initiation reactions were performed in 
transcription buffer containing 40 mM Tris-HCl (pH 7.9), 40 mM 
KC1, and 10 mM MgCl 2 , at 37°C for the E. coli and Eco-tth iiav 
RNAPs and at 55°C for the T. thermophilus RNAP. The 
concentrations of the RNAPs and promoters were 0.1-1 and 
0.05-0.5 |jlM, respectively. The gp39 concentrations in the tran- 
scription reactions were 0.03, 0.1, 0.3, 1, and 3 |xM (Fig. 3G) ; 1.2 
and 6 |jlM (Fig. 4C); and 5 |xM (Fig. 4D). Transcription was initiated 
by the addition of dinucleotide primers and the [a- 32 P]-NTP, 
corresponding to the next promoter position, to preformed pro- 
moter complexes and was stopped after 5-10 min by the addition 
of an equal volume of urea-formamide loading buffer. RNA 
products were resolved in 20% polyacrylamide denaturing gels. 
Analyses of transcription anti-termination by gp39 were per- 
formed on a DNA fragment containing the T7A1 promoter 
followed by the tR2 terminator of phage \, as described previously 
(Berdygulova et al. 2012). 

Footprinting 

For in vitro footprinting experiments, dsDNA promoter frag- 
ments were prepared from 100-nucleotide oligonucleotides cor- 
responding to the -65/+35 promoter positions, as described 
(Mekler et al. 2011). In each case, either the nontemplate or 
template strand oligonucleotide was labeled with [7- 32 P]-ATP at 
its 5' end. The labeled promoter fragments were purified on 
Micro Bio-spin 6 columns (Bio-Rad) and used for the assays at 
0.05—0.1 |jlM concentrations. 

DNase I footprinting Promoter complexes were formed with 
0.2 |xM Eco-tth Rap RNAP, 0.6 |xM gp39, and 0.05 |xM promoter 
DNA in 10 julL of transcription buffer. In most experiments, 
RNAP was preincubated with gp39 for 10 min at 37°C prior to 
the addition of DNA; in some reactions, DNA was added before 
gp39. The reactions were incubated for 10 min at 37°C followed 
by the addition of 0.2 U of DNase I (New England Biolabs). After 
60 sec at 37°C, the reactions were stopped by the addition of 
10 (jug of calf thymus DNA in 10 julL of water followed by ethanol 



GENES & DEVELOPMENT 529 



Tagami et al. 



precipitation. The samples were dissolved in 8 julL of urea- 
formamide loading buffer and resolved in 7% polyacrylamide 
denaturing gels. 

ExoIII footphnting Promoter complexes were obtained as de- 
scribed above and treated with 0.5 U of ExoIII (New England 
Biolabs) for 60 sec at 37°C. The reactions were stopped by the 
addition of calf thymus DNA and processed as described above. 

KMn 0 4 probing The reactions were prepared as described above, 
except the concentrations of Eco-tth^ RNAP and gp39 were 0.1 
IjlM and 0.06-0.3 |jiM, respectively. Promoter complexes were 
treated with 2 mM KMn0 4 for 50 sec at 37°C. The reactions were 
then stopped by the addition of 30 mM p-mercaptoethanol 
followed by ethanol precipitation and treatment with 10% piper- 
idine for 20 min at 95°C. The samples were treated with 
chloroform, ethanol-precipitated, dissolved in 8 julL of loading 
buffer, and resolved in 7% polyacrylamide denaturing gels. 

Database deposition 

The coordinates and structure-factor amplitudes have been de- 
posited in PDB under accession numbers 3WOD (holo*gp39), 
3WOE (p-flap-gp39 [6-109]), and 3WOF (0-flap«gp39 [6-132]). 
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